68 research outputs found

    Semi-Supervised Deep Regression with Uncertainty Consistency and Variational Model Ensembling via Bayesian Neural Networks

    Full text link
    Deep regression is an important problem with numerous applications. These range from computer vision tasks such as age estimation from photographs, to medical tasks such as ejection fraction estimation from echocardiograms for disease tracking. Semi-supervised approaches for deep regression are notably under-explored compared to classification and segmentation tasks, however. Unlike classification tasks, which rely on thresholding functions for generating class pseudo-labels, regression tasks use real number target predictions directly as pseudo-labels, making them more sensitive to prediction quality. In this work, we propose a novel approach to semi-supervised regression, namely Uncertainty-Consistent Variational Model Ensembling (UCVME), which improves training by generating high-quality pseudo-labels and uncertainty estimates for heteroscedastic regression. Given that aleatoric uncertainty is only dependent on input data by definition and should be equal for the same inputs, we present a novel uncertainty consistency loss for co-trained models. Our consistency loss significantly improves uncertainty estimates and allows higher quality pseudo-labels to be assigned greater importance under heteroscedastic regression. Furthermore, we introduce a novel variational model ensembling approach to reduce prediction noise and generate more robust pseudo-labels. We analytically show our method generates higher quality targets for unlabeled data and further improves training. Experiments show that our method outperforms state-of-the-art alternatives on different tasks and can be competitive with supervised methods that use full labels. Our code is available at https://github.com/xmed-lab/UCVME.Comment: Accepted by AAAI2

    Radiomics-Informed Deep Learning for Classification of Atrial Fibrillation Sub-Types from Left-Atrium CT Volumes

    Full text link
    Atrial Fibrillation (AF) is characterized by rapid, irregular heartbeats, and can lead to fatal complications such as heart failure. The disease is divided into two sub-types based on severity, which can be automatically classified through CT volumes for disease screening of severe cases. However, existing classification approaches rely on generic radiomic features that may not be optimal for the task, whilst deep learning methods tend to over-fit to the high-dimensional volume inputs. In this work, we propose a novel radiomics-informed deep-learning method, RIDL, that combines the advantages of deep learning and radiomic approaches to improve AF sub-type classification. Unlike existing hybrid techniques that mostly rely on na\"ive feature concatenation, we observe that radiomic feature selection methods can serve as an information prior, and propose supplementing low-level deep neural network (DNN) features with locally computed radiomic features. This reduces DNN over-fitting and allows local variations between radiomic features to be better captured. Furthermore, we ensure complementary information is learned by deep and radiomic features by designing a novel feature de-correlation loss. Combined, our method addresses the limitations of deep learning and radiomic approaches and outperforms state-of-the-art radiomic, deep learning, and hybrid approaches, achieving 86.9% AUC for the AF sub-type classification task. Code is available at https://github.com/xmed-lab/RIDL.Comment: Accepted by MICCAI2

    MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model

    Full text link
    Face-to-face communication is a common scenario including roles of speakers and listeners. Most existing research methods focus on producing speaker videos, while the generation of listener heads remains largely overlooked. Responsive listening head generation is an important task that aims to model face-to-face communication scenarios by generating a listener head video given a speaker video and a listener head image. An ideal generated responsive listening video should respond to the speaker with attitude or viewpoint expressing while maintaining diversity in interaction patterns and accuracy in listener identity information. To achieve this goal, we propose the \textbf{M}ulti-\textbf{F}aceted \textbf{R}esponsive Listening Head Generation Network (MFR-Net). Specifically, MFR-Net employs the probabilistic denoising diffusion model to predict diverse head pose and expression features. In order to perform multi-faceted response to the speaker video, while maintaining accurate listener identity preservation, we design the Feature Aggregation Module to boost listener identity features and fuse them with other speaker-related features. Finally, a renderer finetuned with identity consistency loss produces the final listening head videos. Our extensive experiments demonstrate that MFR-Net not only achieves multi-faceted responses in diversity and speaker identity information but also in attitude and viewpoint expression.Comment: Accepted by ACM MM 202

    OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions

    Full text link
    One-shot talking head generation has no explicit head movement reference, thus it is difficult to generate talking heads with head motions. Some existing works only edit the mouth area and generate still talking heads, leading to unreal talking head performance. Other works construct one-to-one mapping between audio signal and head motion sequences, introducing ambiguity correspondences into the mapping since people can behave differently in head motions when speaking the same content. This unreasonable mapping form fails to model the diversity and produces either nearly static or even exaggerated head motions, which are unnatural and strange. Therefore, the one-shot talking head generation task is actually a one-to-many ill-posed problem and people present diverse head motions when speaking. Based on the above observation, we propose OSM-Net, a \textit{one-to-many} one-shot talking head generation network with natural head motions. OSM-Net constructs a motion space that contains rich and various clip-level head motion features. Each basis of the space represents a feature of meaningful head motion in a clip rather than just a frame, thus providing more coherent and natural motion changes in talking heads. The driving audio is mapped into the motion space, around which various motion features can be sampled within a reasonable range to achieve the one-to-many mapping. Besides, the landmark constraint and time window feature input improve the accurate expression feature extraction and video generation. Extensive experiments show that OSM-Net generates more natural realistic head motions under reasonable one-to-many mapping paradigm compared with other methods.Comment: Paper Under Revie

    FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions

    Full text link
    One-shot talking head generation has received growing attention in recent years, with various creative and practical applications. An ideal natural and vivid generated talking head video should contain natural head pose changes. However, it is challenging to map head pose sequences from driving audio since there exists a natural gap between audio-visual modalities. In this work, we propose a Flow-guided One-shot model that achieves NaTural head motions(FONT) over generated talking heads. Specifically, the head pose prediction module is designed to generate head pose sequences from the source face and driving audio. We add the random sampling operation and the structural similarity constraint to model the diversity in the one-to-many mapping between audio-visual modality, thus predicting natural head poses. Then we develop a keypoint predictor that produces unsupervised keypoints from the source face, driving audio and pose sequences to describe the facial structure information. Finally, a flow-guided occlusion-aware generator is employed to produce photo-realistic talking head videos from the estimated keypoints and source face. Extensive experimental results prove that FONT generates talking heads with natural head poses and synchronized mouth shapes, outperforming other compared methods.Comment: Accepted by ICME202

    OPT: One-shot Pose-Controllable Talking Head Generation

    Full text link
    One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face. To guarantee the naturalness and realness, recent methods propose to achieve free pose control instead of simply editing mouth areas. However, existing methods do not preserve accurate identity of source face when generating head motions. To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT). Specifically, the Audio Feature Disentanglement Module separates content features from audios, eliminating the influence of speaker-specific information contained in arbitrary driving audios. Later, the mouth expression feature is extracted from the content feature and source face, during which the landmark loss is designed to enhance the accuracy of facial structure and identity preserving quality. Finally, to achieve free pose control, controllable head pose features from reference videos are fed into the Video Generator along with the expression feature and source face to generate new talking heads. Extensive quantitative and qualitative experimental results verify that OPT generates high-quality pose-controllable talking heads with no identity mismatch problem, outperforming previous SOTA methods.Comment: Accepted by ICASSP202

    On the validity of the local Fourier analysis

    Full text link
    Local Fourier analysis (LFA) is a useful tool in predicting the convergence factors of geometric multigrid methods (GMG). As is well known, on rectangular domains with periodic boundary conditions this analysis gives the exact convergence factors of such methods. In this work, using the Fourier method, we extend these results by proving that such analysis yields the exact convergence factors for a wider class of problems

    COVID-19 vaccination willingness among people living with HIV in Shijiazhuang, China: a cross-sectional survey

    Get PDF
    ObjectivesThe COVID-19 pandemic imposed an enormous disease and economic burden worldwide. SARS-CoV-2 vaccination is essential to containing the pandemic. People living with HIV (PLWH) may be more vulnerable to severe COVID-19 outcomes; thus, understanding their vaccination willingness and influencing factors is helpful in developing targeted vaccination strategies.MethodsA cross-sectional study was conducted between 15 June and 30 August 2022 in Shijiazhuang, China. Variables included socio-demographic characteristics, health status characteristics, HIV-related characteristics, knowledge, and attitudes toward COVID-19 vaccination and COVID-19 vaccination status. Multivariable logistic regression was used to confirm factors associated with COVID-19 vaccination willingness among PLWH.ResultsA total of 1,428 PLWH were included, with a 90.48% willingness to receive the COVID-19 vaccination. PLWH were more unwilling to receive COVID-19 vaccination for those who were female or had a fair/poor health status, had an allergic history and comorbidities, were unconvinced and unsure about the effectiveness of vaccines, were unconvinced and unsure about the safety of vaccines, were convinced and unsure about whether COVID-19 vaccination would affect ART efficacy, or did not know at least a type of domestic COVID-19 vaccine. Approximately 93.00% of PLWH have received at least one dose of the COVID-19 vaccine among PLWH, and 213 PLWH (14.92%) reported at least one adverse reaction within 7 days.ConclusionIn conclusion, our study reported a relatively high willingness to receive the COVID-19 vaccination among PLWH in Shijiazhuang. However, a small number of PLWH still held hesitancy; thus, more tailored policies or guidelines from the government should be performed to enhance the COVID-19 vaccination rate among PLWH

    Prompt-to-afterglow transition of optical emission in a long gamma-ray burst consistent with a fireball

    Full text link
    Long gamma-ray bursts (GRBs), which signify the end-life collapsing of very massive stars, are produced by extremely relativistic jets colliding into circumstellar medium. Huge energy is released both in the first few seconds, namely the internal dissipation phase that powers prompt emissions, and in the subsequent self-similar jet-deceleration phase that produces afterglows observed in broad-band electromagnetic spectrum. However, prompt optical emissions of GRBs have been rarely detected, seriously limiting our understanding of the transition between the two phases. Here we report detection of prompt optical emissions from a gamma-ray burst (i.e. GRB 201223A) using a dedicated telescope array with a high temporal resolution and a wide time coverage. The early phase coincident with prompt {\gamma}-ray emissions show a luminosity in great excess with respect to the extrapolation of {\gamma}-rays, while the later luminosity bump is consistent with onset of the afterglow. The clearly detected transition allows us to differentiate physical processes contributing to early optical emissions and to diagnose the composition of the jetComment: Authors' version of article published in Nature Astronomy, see their website for official versio
    corecore